Goto

Collaborating Authors

 bn statistics




FedL2P: Federated Learning to Personalize

Neural Information Processing Systems

In this paper, we consider the federated meta-learning problem of learning personalization strategies. Specifically, we consider meta-nets that induce the batch-norm and learning rate parameters for each client given local data statistics.


Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation

Neural Information Processing Systems

Continual learning entails learning a sequence of tasks and balancing their knowledge appropriately. With limited access to old training samples, much of the current work in deep neural networks has focused on overcoming catastrophic forgetting of old tasks in gradient-based optimization. However, the normalization layers provide an exception, as they are updated interdependently by the gradient and statistics of currently observed training samples, which require specialized strategies to mitigate recency bias. In this work, we focus on the most popular Batch Normalization (BN) and provide an in-depth theoretical analysis of its sub-optimality in continual learning. Our analysis demonstrates the dilemma between balance and adaptation of BN statistics for incremental tasks, which potentially affects training stability and generalization. Targeting on these particular challenges, we propose Adaptive Balance of BN (AdaB$^2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions and a modified momentum to balance BN statistics, corresponding to the training and testing stages. By implementing BN in a continual learning fashion, our approach achieves significant performance gains across a wide range of benchmarks, particularly for the challenging yet realistic online scenarios (e.g., up to 7.68\%, 6.86\% and 4.26\% on Split CIFAR-10, Split CIFAR-100 and Split Mini-ImageNet, respectively). Our code is available at https://github.com/lvyilin/AdaB2N.



Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation

Neural Information Processing Systems

Continual learning entails learning a sequence of tasks and balancing their knowledge appropriately. With limited access to old training samples, much of the current work in deep neural networks has focused on overcoming catastrophic forgetting of old tasks in gradient-based optimization.


FedL2P: Federated Learning to Personalize

Neural Information Processing Systems

In this paper, we consider the federated meta-learning problem of learning personalization strategies. Specifically, we consider meta-nets that induce the batch-norm and learning rate parameters for each client given local data statistics.



Gradient Inversion with Generative Image Prior

Neural Information Processing Systems

However, a gradient is often insufficient to reconstruct the user data without any prior knowledge. By exploiting a generative model pretrained on the data distribution, we demonstrate that data privacy can be easily breached.


Higher-Order Asymptotics of Test-Time Adaptation for Batch Normalization Statistics

arXiv.org Machine Learning

This study develops a higher-order asymptotic framework for test-time adaptation (TTA) of Batch Normalization (BN) statistics under distribution shift by integrating classical Edgeworth expansion and saddlepoint approximation techniques with a novel one-step M-estimation perspective. By analyzing the statistical discrepancy between training and test distributions, we derive an Edgeworth expansion for the normalized difference in BN means and obtain an optimal weighting parameter that minimizes the mean-squared error of the adapted statistic. Reinterpreting BN TTA as a one-step M-estimator allows us to derive higher-order local asymptotic normality results, which incorporate skewness and other higher moments into the estimator's behavior. Moreover, we quantify the trade-offs among bias, variance, and skewness in the adaptation process and establish a corresponding generalization bound on the model risk. The refined saddlepoint approximations further deliver uniformly accurate density and tail probability estimates for the BN TTA statistic. These theoretical insights provide a comprehensive understanding of how higher-order corrections and robust one-step updating can enhance the reliability and performance of BN layers in adapting to changing data distributions.